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DEFERRING AND COMBINING WRITE BARRIERS FOR A 
GARBAGE-COLLECTED HEAP 

FIELD OF THE INVENTION 

The present invention is directed to memory management. It particularly con- 
cerns what has come to be known as "garbage collection." 

BACKGROUND OF THE INVENTION 

In the field of computer systems, considerable effort has been expended on the 
task of allocating memory to data objects. For the purposes of this discussion, the term 
object refers to a data structure represented in a computer system's memory. Other terms 
sometimes used for the same concept are record and structure. An object may be identi- 
fied by a reference, a relatively small amount of information that can be used to access 
the object. A reference can be represented as a "pointer" or a "machine address," which 
may require, for instance, only sixteen, thirty-two, or sixty-four bits of information, al- 
though there are other ways to represent a reference. 

In some systems, which are usually known as "object oriented," objects may have 
associated methods, which are routines that can be invoked by reference to the object. 
They also may belong to a class, which is an organizational entity that may contain 
method code or other information shared by all objects belonging to that class. In the 
discussion that follows, though, the term object will not be limited to such structures; it 
will additionally include structures with which methods and classes are not associated. 

The invention to be described below is applicable to systems that allocate mem- 
ory to objects dynamically. Not all systems employ dynamic allocation. In some com- 
puter languages, source programs can be so written that all objects to which the pro- 
gram's variables refer are bound to storage locations at compile time. This storage- 
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allocation approach, sometimes referred to as "static allocation," is the policy tradition- 
ally used by the Fortran programming language, for example. 

Even for compilers that are thought of as allocating objects only statically, of 
course, there is often a certain level of abstraction to this binding of objects to storage 
locations. Consider the typical computer system 100 depicted in Fig. 1, for example. 
Data, and instructions for operating on them, that a microprocessor 1 10 uses may reside 
in on-board cache memory or be received from further cache memory 120, possibly 
through the mediation of a cache controller 130. That controller 130 can in turn receive 
such data from system read/write memory ("RAM") 140 through a RAM controller 150 
or from various peripheral devices through a system bus 160. Additionally, instructions 
and data may be received from other computer systems via a communication interface 
180. The memory space made available to an application program may be "virtual" in 
the sense that it may actually be considerably larger than RAM 140 provides. So the 
RAM contents will be swapped to and from a system disk 170. 

Additionally, the actual physical operations performed to access some of the 
most-recently visited parts of the process's address space often will actually be performed 
in the cache 120 or in a cache on board microprocessor 110 rather than on the RAM 140, 
with which those caches swap data and instructions just as RAM 140 and system 
disk 170 do with each other. 

A further level of abstraction results from the fact that an application will often be 
run as one of many processes operating concurrently with the support of an underlying 
operating system. As part of that system's memory management, the application's mem- 
ory space may be moved among different actual physical locations many times in order to 
allow different processes to employ shared physical memory devices. That is, the loca- 
tion specified in the application's machine code may actually result in different physical 
locations at different times because the operating system adds different offsets to the ma- 
chine-language-specified location. 

The use of static memory allocation in writing certain long-lived applications 
makes it difficult to restrict storage requirements to the available memory space. Abiding 
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by space limitations is easier when the platform provides for dynamic memory allocation, 
i.e., when memory space to be allocated to a given object is determined only at run time. 

Dynamic allocation has a number of advantages, among which is that the run-time 
system is able to adapt allocation to run-time conditions. For example, the programmer 
can specify that space should be allocated for a given object only in response to a par- 
ticular run-time condition. The C-language library function malloc() is often used for this 
purpose. Conversely, the programmer can specify conditions under which memory pre- 
viously allocated to a given object can be reclaimed for reuse. The C-language library 
function free() results in such memory reclamation. Because dynamic allocation provides 
for memory reuse, it facilitates generation of large or long-lived applications, which over 
the course of their lifetimes may employ objects whose total memory requirements would 
greatly exceed the available memory resources if they were bound to memory locations 
statically. 

Particularly for long-lived applications, though, allocation and reclamation of dy- 
namic memory must be performed carefully. If the application fails to reclaim unused 
memory — or, worse, loses track of the address of a dynamically allocated segment of 
memory — its memory requirements will grow over time to exceed the system's available 
memory. This kind of error is known as a "memory leak." Another kind of error occurs 
when an application reclaims memory for reuse even though it still maintains a reference 
to that memory. If the reclaimed memory is reallocated for a different purpose, the appli- 
cation may inadvertently manipulate the same memory in multiple inconsistent ways. 
This kind of error is known as a "dangling reference." 

A way of reducing the likelihood of such leaks and related errors is to provide 
memory-space reclamation in a more automatic manner. Techniques used by systems 
that reclaim memory space automatically are commonly referred to as garbage collec- 
tion. Garbage collectors operate by reclaiming space that they no longer consider "reach- 
able." Statically allocated objects represented by a program's global variables are nor- 
mally considered reachable throughout a program's life. Such objects are not ordinarily 
stored in the garbage collector's managed memory space, but they may contain refer- 
ences to dynamically allocated objects that are, and such objects are considered reach- 
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able. Clearly, an object referred to in the processor's call stack is reachable, as is an ob- 
ject referred to by register contents. And an object referred to by any reachable object is 
also reachable. As used herein, a call stack is a data structure corresponding to a process 
or thread (i.e., an application), whereby the call stack comprises a sequence of frames that 
store state information, such as register contents and program counter values, associated 
with nested routines within the process or thread. 

The use of garbage collectors is advantageous because, whereas a programmer 
working on a particular sequence of code can perform his task creditably in most respects 
with only local knowledge of the application at any given time, memory allocation and 
reclamation require a global knowledge of the program. Specifically, a programmer 
dealing with a given sequence of code does tend to know whether some portion of mem- 
ory is still in use for that sequence of code, but it is considerably more difficult for him to 
know what the rest of the application is doing with that memory. By tracing references 
from some conservative notion of a root set, e.g., global variables, registers, and the call 
stack, automatic garbage collectors obtain global knowledge in a methodical way. By 
using a garbage collector, the programmer is relieved of the need to worry about the ap- 
plication's global state and can concentrate on local-state issues, which are more man- 
ageable. The result is applications that are more robust, having no dangling references 
and fewer memory leaks. 

Garbage collection mechanisms can be implemented by various parts and levels 
of a computing system. One approach is simply to provide them as part of a batch com- 
piler's output. Consider Fig. 2's simple batch-compiler operation, for example. A com- 
puter system executes in accordance with compiler object code and therefore acts as a 
compiler 200. The compiler object code is typically stored on a medium such as Fig. 1 's 
system disk 170 or some other machine-readable medium, and it is loaded into RAM 140 
to configure the computer system to act as a compiler. In some cases, though, the com- 
piler object code's persistent storage may instead be provided in a server system remote 
from the machine that performs the compiling. The electrical signals that carry the digi- 
tal data by which the computer systems exchange that code are examples of the kinds of 
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electromagnetic signals by which the computer instructions can be communicated. Oth- 
ers include radio waves, microwaves, and both visible and invisible light. 

The input to the compiler is the application source code, and the end product of 
the compiler process is application object code. This object code defines an applica- 
tion 210, which typically operates on input such as mouse clicks, etc., to generate a dis- 
play or some other type of output. This object code implements the relationship that the 
programmer intends to specify by his application source code. In one approach to gar- 
bage collection, the compiler 200, without the programmer's explicit direction, addition- 
ally generates code that automatically reclaims unreachable memory space. 

Even in this simple case, though, there is a sense in which the application does not 
itself provide the entire garbage collector. Specifically, the application will typically call 
upon the underlying operating system's memory-allocation functions. And the operating 
system may in turn take advantage of various hardware that lends itself particularly to use 
in garbage collection. So even a very simple system may disperse the garbage collection 
mechanism over a number of computer system layers. 

To get some sense of the variety of system components that can be used to im- 
plement garbage collection, consider Fig. 3's example of a more complex way in which 
various levels of source code can result in the machine instructions that a processor exe- 
cutes. In the Fig. 3 arrangement, the human applications programmer produces source 
code 310 written in a high-level language. A compiler 320 typically converts that code 
into "class files." These files include routines written in instructions, called "byte 
codes" 330, for a "virtual machine" that various processors can be configured to emulate. 
This conversion into byte codes is almost always separated in time from those codes' 
execution, so Fig. 3 divides the sequence into a "compile-time environment" 300 separate 
from a "run-time environment" 340, in which execution occurs. One example of a high- 
level language for which compilers are available to produce such virtual-machine in- 
structions is the Java™ programming language. {Java is a trademark or registered 
trademark of Sun Microsystems, Inc., in the United States and other countries.) 

Most typically, the class files' byte-code routines are executed by a processor un- 
der control of a virtual-machine process 350. That process emulates a virtual machine 
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from whose instruction set the byte codes are drawn. As is true of the compiler 320, the 
virtual-machine process 350 may be specified by code stored on a local disk or some 
other machine-readable medium from which it is read into Fig. 1 's RAM 140 to config- 
ure the computer system to implement the garbage collector and otherwise act as a virtual 
machine. Again, though, that code's persistent storage may instead be provided by a 
server system remote from the processor that implements the virtual machine, in which 
case the code would be transmitted, e.g., electrically or optically to the virtual-machine- 
implementing processor. 

In some implementations, much of the virtual machine's action in executing these 
byte codes is most like what those skilled in the art refer to as "interpreting," so Fig. 3 
depicts the virtual machine as including an "interpreter" 360 for that purpose. In addition 
to or instead of running an interpreter, many virtual-machine implementations actually 
compile the byte codes concurrently with the resultant object code's execution, so Fig. 3 
depicts the virtual machine as additionally including a "just-in-time" compiler 370. The 
arrangement of Fig. 3 differs from Fig. 2 in that the compiler 320 for converting the hu- 
man programmer's code does not contribute to providing the garbage collection function; 
that results largely from the virtual machine 350's operation. 

Those skilled in that art will recognize that both of these organizations are merely 
exemplary, and many modern systems employ hybrid mechanisms, which partake of the 
characteristics of traditional compilers and traditional interpreters both. The invention to 
be described below is applicable independently of whether a batch compiler, a just-in- 
time compiler, an interpreter, or some hybrid is employed to process source code. In the 
remainder of this application, therefore, we will use the term compiler to refer to any 
such mechanism, even if it is what would more typically be called an interpreter. 

Now, some of the functionality that source-language constructs specify can be 
quite complicated, requiring many machine-language instructions for their implementa- 
tion. One quite-common example is a source-language instruction that calls for 64-bit 
arithmetic on a 32-bit machine. More germane to the present invention is the operation 
of dynamically allocating space to a new object; this may require determining whether 
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enough free memory space is available to contain the new object and reclaiming space if 
there is not. 

In such situations, the compiler may produce "inline" code to accomplish these 
operations. That is, all object-code instructions for carrying out a given source-code- 
prescribed operation will be repeated each time the source code calls for the operation. 
But inlining runs the risk that "code bloat" will result if the operation is invoked at many 
source-code locations. 

The natural way of avoiding this result is instead to provide the operation's im- 
plementation as a procedure, i.e., a single code sequence that can be called from any lo- 
cation in the program. In the case of compilers, a collection of procedures for imple- 
menting many types of source-code-specified operations is called a runtime system for 
the language. The compiler and its runtime system are designed together so that the 
compiler "knows" what runtime-system procedures are available in the target computer 
system and can cause desired operations simply by including calls to procedures that the 
target system already contains. To represent this fact, Fig. 3 includes block 380 to show 
that the compiler's output makes calls to the runtime system as well as to the operating 
system 390, which consists of procedures that are similarly system resident but are not 
compiler-dependent. 

Although the Fig. 3 arrangement is a popular one, it is by no means universal, and 
many further implementation types can be expected. Proposals have even been made to 
implement the virtual machine 350's behavior in a hardware processor, in which case the 
hardware itself would provide some or all of the garbage-collection function. In short, 
garbage collectors can be implemented in a wide range of combinations of hardware 
and/or software. 

By implementing garbage collection, a computer system can greatly reduce the 
occurrence of memory leaks and other software deficiencies in which human program- 
ming frequently results. But it can also have significant adverse performance effects if it 
is not implemented carefully. To distinguish the part of the program that does "useful" 
work from that which does the garbage collection, the term mutator is sometimes used in 
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discussions of these effects; from the collector's point of view, what the mutator does is 
mutate active data structures' connectivity. 

Some garbage collection approaches rely heavily on interleaving garbage collec- 
tion steps among mutator steps. In one type of garbage collection approach, for instance, 
the mutator operation of writing a reference is followed immediately by garbage collector 
steps used to maintain a reference count in that object's header, and code for subsequent 
new-object storage includes steps for finding space occupied by objects whose reference 
count has fallen to zero. Obviously, such an approach can slow mutator operation sig- 
nificantly. 

Other approaches therefore interleave very few garbage collector-related instruc- 
tions into the main mutator process but instead interrupt it from time to time to perform 
garbage collection cycles, in which the garbage collector finds unreachable objects and 
reclaims their memory space for reuse. Such an approach will be assumed in discussing 
Fig. 4's depiction of a simple garbage collection operation. Within the memory space 
allocated to a given application is a part 420 managed by automatic garbage collection. 
As used hereafter, all dynamically allocated memory associated with a process or thread 
will be referred to as its heap. During the course of the application's execution, space is 
allocated for various objects 402, 404, 406, 408, and 410. Typically, the mutator allo- 
cates space within the heap by invoking the garbage collector, which at some level man- 
ages access to the heap. Basically, the mutator asks the garbage collector for a pointer to 
a heap region where it can safely place the object's data. The garbage collector keeps 
track of the fact that the thus-allocated region is occupied. It will refrain from allocating 
that region in response to any other request until it determines that the mutator no longer 
needs the region allocated to that object. 

Garbage collectors vary as to which objects they consider reachable and unreach- 
able. For the present discussion, though, an object will be considered "reachable" if it is 
referred to, as object 402 is, by a reference in a root set 400. The root set consists of ref- 
erence values stored in the mutator's threads' call stacks, the central processing unit 
(CPU) registers, and global variables outside the garbage collected heap. An object is 
also reachable if it is referred to, as object 406 is, by another reachable object (in this 
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case, object 402). Objects that are not reachable can no longer affect the program, so it is 
safe to re-allocate the memory spaces that they occupy. 

A typical approach to garbage collection is therefore to identify all reachable ob- 
jects and reclaim any previously allocated memory that the reachable objects do not oc- 
cupy. A typical garbage collector may identify reachable objects by tracing references 
from the root set 400. For the sake of simplicity, Fig. 4 depicts only one reference from 
the root set 400 into the heap 420. (Those skilled in the art will recognize that there are 
many ways to identify references, or at least data contents that may be references.) The 
collector notes that the root set points to object 402, which is therefore reachable, and that 
reachable object 402 points to object 406, which therefore is also reachable. But those 
reachable objects point to no other objects, so objects 404, 408, and 410 are all unreach- 
able, and their memory space may be reclaimed. 

To avoid excessive heap fragmentation, some garbage collectors additionally re- 
locate reachable objects. Fig. 5 shows a typical approach for this "copying" type of gar- 
bage collection. The heap is partitioned into two halves, hereafter called "semi-spaces." 
For one garbage collection cycle, all objects are allocated in one semi-space 510, leaving 
the other semi-space 520 free. When the garbage collection cycle occurs, objects identi- 
fied as reachable are "evacuated" to the other semi-space 520, so all of semi-space 510 is 
then considered free. Once the garbage collection cycle has occurred, all new objects are 
allocated in the lower semi-space 520 until yet another garbage collection cycle occurs, at 
which time the reachable objects are evacuated back to the upper semi-space 510. 

Although this relocation requires the extra steps of copying the reachable objects 
and updating references to them, it tends to be quite efficient, since most new objects 
quickly become unreachable, so most of the current semi-space is actually garbage. That 
is, only a relatively few, reachable objects need to be relocated, after which the entire 
semi-space contains only garbage and can be pronounced free for reallocation. 

A conceptually simple way of deciding when to perform such a collection opera- 
tion is simply to have it be triggered when the absence of enough free space prevents an 
attempted allocation from occurring. The mutator operation would then be interrupted to 
perform a garbage collection cycle, in which all objects, reachable from the root set are 
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identified, and the space occupied by the other (garbage) objects is placed in a list of free 
memory blocks. For certain types of applications, this approach to collection-cycle 
scheduling is acceptable and, in fact, highly efficient. 

For many interactive and real-time applications, though, this approach is not ac- 
ceptable. The delay in mutator operation that the collection cycle's execution causes can 
be annoying to a user and can prevent a real-time application from responding to its envi- 
ronment with the required speed. In some applications, this effect can be reduced by 
choosing collection times opportunistically. For example, a garbage-collection cycle may 
be performed at a natural stopping point in the application, such as when the mutator 
awaits user input. 

So it may often be true that the garbage collection operation's effect on perform- 
ance can depend less on the total collection time than on when collections actually occur. 
But another factor that often is even more determinative is the duration of any single 
collection cycle, i.e., how long the mutator must remain quiescent at any one time. In an 
interactive system, for instance, a user may never notice hundred-millisecond interrup- 
tions for garbage collection, whereas most users would find interruptions lasting for two 
seconds to be annoying. To reduce the collector's adverse affect on mutator operation 
further, many collectors operate incrementally: they reclaim less than all of the unreach- 
able objects' memory space in any one collection interval. 

Most collectors that employ incremental collection operate in "generations" al- 
though this is not necessary in principle. Different portions, or generations, of the heap 
are subject to different collection policies. New objects are allocated in a "young" gen- 
eration, and older objects are "promoted" from younger generations to older or more 
"mature" generations. Collecting the younger generations more frequently than the oth- 
ers yields greater efficiency because the younger generations tend to accumulate garbage 
faster; newly allocated objects tend to "die," while older objects tend to "survive." 

But generational collection greatly increases what is effectively the root set for a 
given generation. Consider Fig. 6, which depicts a heap as organized into three genera- 
tions 620, 640, and 660. Assume that generation 640 is to be collected. The process for 
this individual generation may be more or less the same as that described in connection 
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with Figs. 4 and 5 for the entire heap, with one major exception. In the case of a single 
generation, the root set must be considered to include not only the call stack, registers, 
and global variables represented by set 600 but also objects in the other generations 620 
and 660, which themselves may contain references to objects in generation 640. So 
pointers must be traced not only from the basic root set 600 but also from objects within 
the other generations. 

One could perform this tracing by simply inspecting all references in all other 
generations at the beginning of every collection cycle, and it turns out that this approach 
is actually feasible in some situations. But it takes too long in other situations, so work- 
ers in this field have employed a number of approaches to expediting reference tracing. 
One approach is to include so-called write barriers in the mutator process. A write bar- 
rier is code added to a write operation in the mutator code to record information from 
which the garbage collector can determine where references were written or may have 
been written since the last collection interval. For each of a plurality of heap subdivisions 
that may be collected in different increments, a respective list is kept of where references 
to objects in that heap subdivision have been found. Each list can then be maintained by 
taking such a list as it existed at the end of the previous collection interval and updating it 
by inspecting only locations identified by the write barriers as possibly modified since the 
last collection interval. 

One of the many write-barrier implementations commonly used by workers in this 
art employs what has been referred to as the "card table." Fig. 6 depicts the various gen- 
erations as being divided into smaller sections, known for this purpose as "cards." Card 
tables 610, 630, and 650 associated with respective generations contain an entry for each 
of their cards. When the mutator writes a reference in a card, it makes an appropriate en- 
try in the card-table location associated with that card (or, say, with the card in which the 
object containing the reference begins). Most write-barrier implementations simply make 
a Boolean entry indicating that the write operation has been performed, although some 
may be more elaborate. For example, assume reference 624 on card 622 is modified 
("dirtied") by the mutator, so a Boolean entry in corresponding card-table entry 605 may 
be set accordingly. The mutator having thus left a record of where new or modified ref- 
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erences may be, the collector may scan the card-table to identify those cards in the mature 
generation that were marked as having been modified since the last collection cycle, and 
the collector can scan only those identified cards for modified references. 

Of course, there are other write-barrier approaches, such as simply having the 
write barrier add to a list of addresses where references were written. For instance, the 
list may be stored in a sequential store buffer that is updated by write barriers in the mu- 
tator code. When the sequential store buffer is filled, the mutator may be interrupted so a 
garbage collector can reclaim unused memory based on addresses in the buffer. At the 
end of such a collection cycle, the buffer is cleared and the mutator resumes until it is in- 
terrupted again by the next garbage-collection cycle. 

Also, although there is no reason in principle to favor any particular number of 
generations, and although Fig. 6 shows three, most generational garbage collectors have 
only two generations, of which one is the young generation and the other is the mature 
generation. Moreover, although Fig. 6 shows the generations as being of the same size, a 
more-typical configuration is for the young generation to be considerably smaller. Fur- 
ther, each generation may be dispersed over various address ranges of memory instead of 
comprising a contiguous block of memory as shown in Fig. 6. Finally, although we as- 
sumed for the sake of simplicity that collection during a given interval was limited to 
only one generation, a more-typical approach is actually to collect the whole young gen- 
eration at every interval but to collect the mature one less frequently. 

Some collectors collect the entire young generation in every interval and may 
thereafter collect the mature generation collection in the same interval. It may therefore 
take relatively little time to scan all young-generation objects remaining after young- 
generation collection to find references into the mature generation. Even when such col- 
lectors do use card tables, therefore, they often do not use them for finding young- 
generation references that refer to mature-generation objects. On the other hand, labori- 
ously scanning the entire mature generation for references to young-generation (or ma- 
ture-generation) objects would ordinarily take too long, so write barriers are typically 
used to set card-table entries associated with the mature generation to thereby limit the 
amount of memory the collector searches for modified mature-generation references. 
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Write barrier code is often inserted into mutator code in close proximity to a cor- 
responding mutator instruction that modifies a reference. In an imprecise card-marking 
scheme, the write barrier code marks the card-table entry that corresponds to the card in 
which the modified object begins. In a precise card-marking scheme, the write barrier 
marks the card-table entry that corresponds to the card in which the modified field is lo- 
cated. Fig. 7 illustrates an exemplary reference-modifying mutator instruction (line N+l) 
and its corresponding sequence of write-barrier code (lines N+3 through N+5) for mark- 
ing a card-table entry in accordance with a precise card-marking scheme. 

Fig. 7's line N+l is an assembly instruction (STW) that stores a word-length 
value into an object reference field located an offset C from the object's starting address. 
Lines N+3 through N+5 illustrate the reference-modifying STW instruction's corre- 
sponding write-barrier code. In this example, the write barrier adds three instructions not 
originally present in the mutator code: ADD, Shift Right Logical (SRL) and Store Byte 
(STB) instructions. Specifically, the instruction at line N+3 stores the address of the 
modified object field in a "working" register, and the instruction at line N+4 divides this 
address by the card size to determine how many cards into the mature generation the 
modified field is located. Here, the card size is assumed to be a power of 2 bytes. Lastly, 
the instruction at line N+5 marks a card-table entry with a binary "0" corresponding to 
the card in the mature generation that stores the modified object reference field. As de- 
scribed, each card-table entry is assumed to have a length of one byte. 

As seen with regards to Fig. 7, the inclusion of write barriers in the mutator code 
increases the size of mutator code, e.g., by three instructions per reference-modifying 
mutator instruction. Further, one or more additional instructions (not shown) may have 
to be added to the mutator code to store the base memory addresses of the card tables 
whose entries are marked by the mutator's write-barrier code. Clearly, this added code 
overhead may significantly increase the mutator's execution time, especially when the 
mutator code contains a relatively large number of reference-modifying instructions. So 
adding write barriers to increase the garbage collector's efficiency tends to compromise 
the mutator's. 
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SUMMARY OF THE INVENTION 

The present invention provides a technique for reducing the number of write bar- 
riers without compromising garbage-collector performance or correctness. Convention- 
ally, a compiler emits write-barrier code at a location immediately following a reference- 
modifying instruction in the mutator code. In contrast, a compiler embodying the inven- 
tion may defer emitting the instruction's write-barrier code until a subsequent location in 
the mutator code, i.e., not immediately following the reference-modifying instruction. By 
deferring write barriers in this manner, the compiler can analyze the deferred write barri- 
ers and combine those that, when executed, provide the same information to the garbage 
collector. Preferably, the compiler emits the remaining, uncombined deferred write bar- 
riers at consecutive locations in the mutator code. Because redundant or unnecessary 
write-barrier code is removed from the mutator code, the inventive technique can mini- 
mize the amount of write-barrier overhead in the mutator, thereby enabling the mutator to 
execute faster and more efficiently. 

In an illustrative embodiment, the compiler maintains a list of where, in an allo- 
cated region of memory, references are modified by mutator instructions. For every ref- 
erence-modifying mutator instruction, the compiler creates an entry in the list in lieu of 
emitting the instruction's corresponding write-barrier code into the mutator code. Each 
list entry stores at least enough information for the compiler to generate the deferred 
write-barrier code corresponding to the entry's associated reference-modifying mutator 
instruction. When the compiler reaches a predetermined location in the mutator code, the 
compiler generates and emits the mutator's deferred write barriers based on the contents 
of the list's entries. 

Advantageously, at some time before the deferred write barriers are emitted, the 
compiler may scan the list to remove or combine entries that would result in the compiler 
generating write-barrier code that, when executed, performs the same garbage-collection 
operations. Accordingly, the compiler eliminates redundant or unnecessary list entries, 
then emits deferred write-barrier code corresponding to the remaining list entries. For 
example, the compiler may combine identical entries in the list since they correspond to 
the same deferred write-barrier code. The compiler also may combine or remove entries 
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that correspond to mutator instructions that modify references in the same region of 
memory, such as a card, when the entries' deferred write-barrier code would provide the 
collector with the same information. In this way, the compiler can reduce the amount of 
write-barrier code emitted in the mutator without negatively affecting the garbage col- 
lector's performance. Preferably, the remaining deferred write barriers (i.e., those that 
are not combined or elided) are emitted sequentially at a predetermined point in the mu- 
tator code, thereby enabling more efficient use of guard code and improving cache per- 
formance when the mutator is executed. 

At run-time, a garbage-collection interval may interrupt the mutator's execution 
before the mutator's deferred write barriers have been executed. When this occurs, the 
garbage collector may refer to a compiler-generated list identifying where unrecorded 
reference modifications were made in the heap before the mutator's execution was inter- 
rupted. According to the illustrative embodiment, the collector may combine or elide re- 
dundant or unnecessary entries in the compiler-generated list, then rely on the list's re- 
maining entries to perform its garbage-collection functions. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above and further advantages of the invention may be better understood by 
referring to the following description in conjunction with the accompanying drawings in 
which like reference numerals indicate identically or functionally similar elements, of 
which: 

Fig. 1, previously discussed, is a schematic block diagram of a computer system 
of a type in which the present invention's teachings can be practiced; 

Fig. 2, previously discussed, is a schematic block diagram illustrating a simple 
source-code compilation operation; 

Fig. 3, previously discussed, is a schematic block diagram of a more complex 
compiler/interpreter organization; 

Fig. 4, previously discussed, is a schematic block diagram that illustrates a basic 
garbage collection mechanism; 
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Fig. 5, previously discussed, is a schematic block diagram illustrating an the relo- 
cation operation of the garbage collection mechanism of Fig. 7; 

Fig. 6, previously discussed, is a schematic block diagram that illustrates a gar- 
bage collected heap's organization into generations; 

Fig. 7, previously discussed, is an exemplary source code listing of a write barrier 
that may be used in accordance with the present invention; 

Fig. 8 is exemplary method code having deferred write barriers emitted at the end 
of the method; 

Fig. 9 is a schematic block diagram of a table that may store information that en- 
ables a compiler to generate the deferred write barriers in Fig. 8; 

Fig. 10 is an exemplary method that includes deferred write-barrier code, where 
the method comprises two instructions that modify the same object reference field; 

Fig. 1 1 is a schematic block diagram of a table having entries that may be com- 
bined or elided to generate a new, condensed table used to generate the deferred write 
barriers in Fig. 10; 

Fig. 12 is a schematic block diagram of an object aligned along double-word 
boundaries; 

Fig. 13 is an exemplary method that includes deferred write-barrier code, where 
the method comprises two instructions that modify object reference fields in the same 
double-word; 

Fig. 14 is a schematic block diagram of a table having entries that may be com- 
bined or elided to generate a new, condensed table used to generate the deferred write 
barriers in Fig. 13; 

Fig. 15 is a schematic block diagram of an object having a plurality of object ref- 
erence fields located within a distance of a card-length; 

Fig. 16 is an exemplary method that includes deferred write-barrier code, where 
the method comprises instructions that modify object reference fields located within a 
known range of memory addresses; 

Fig. 17 is a schematic block diagram of a table having entries that may be com- 
bined or elided to generate a new, condensed table used to generate the deferred write 
barriers in Fig. 16; 
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Fig. 18 is a schematic block diagram of an exemplary extended basic block and its 
associated compiler-generated data structures that may be used by a compiler to generate 
deferred write barriers; 

Fig. 19 is a schematic block diagram depicting an exemplary mutator code's con- 
trol flow through a plurality of basic blocks, one of which is a "join" node, and the mu- 
tator's associated data structures that may be used by a compiler to generate deferred 
write barriers in the mutator; 

Fig. 20 is an exemplary method whose write barriers are deferred across a call in- 
struction to another method; 

Fig. 21 is an exemplary method that includes deferred write-barrier code, where 
the method comprises instructions that "spill" the contents of a hardware register into a 
stack-frame slot; 

Fig. 22 is a schematic block diagram of a table having entries that may be used to 
generate the deferred write barriers emitted in the method shown in Fig. 21; 

Fig. 23 is an exemplary method that includes deferred write-barrier code whose 
execution depends on the result of a guard instruction; 

Fig. 24 is a schematic block diagram of a table having entries that may be used to 
generate the deferred write barriers in Fig. 23; 

Fig. 25 is an exemplary method that includes deferred write-barrier code, where 
the method comprises a plurality of possible safe points; 

Fig. 26 is a schematic block diagram illustrating exemplary mappings between the 
"internal" tables the compiler uses to record where references are modified without corre- 
sponding write-barrier code and their corresponding run-time tables generated by the 
compiler at two illustrative possible safe points; 

Fig. 27 is a schematic block diagram of a pointer table and its set of associated ta- 
bles storing information that may be used by a compiler to generate the deferred write 
barriers in Fig. 25; 

Fig. 28 is a flowchart illustrating a sequence of steps a garbage collector may per- 
form to locate where references were modified by a mutator without execution of corre- 
sponding write-barrier code; 
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Fig. 29 is an exemplary extended basic block having a plurality of possible safe 
points, where write-barrier code in the extended basic block is deferred across a call in- 
struction; 

Fig. 30 is a schematic block diagram of an exemplary call stack a garbage collec- 
tor may use to locate where references were modified in an extended basic block without 
execution of corresponding write-barrier code; 

Figs. 31 A-B are a flowchart illustrating a sequence of steps a garbage collector 
may perform to locate where, in an extended basic block, references were modified with- 
out execution of corresponding write-barrier code; 

Fig. 32 is a schematic block diagram of an exemplary call stack that relies on a 
block of "trampoline" code to record which stack frames should be scanned by the col- 
lector as the collector locates write barriers whose execution was deferred at the time of 
its collection interval; and 

Fig. 33 is a flowchart illustrating a sequence of steps that may be executed by the 
trampoline code in Fig. 32. 

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

I. DEFERRING AND COMBINING WRITE BARRIERS IN COMPILED CODE 

A. Deferring write barriers in compiled code 

Conventionally, a compiler emits a write barrier (i.e., write-barrier code, as shown 
in Fig. 7) at a location in the mutator code immediately following a reference-modifying 
mutator instruction. In contrast, a compiler embodying the present invention will not al- 
ways emit a write barrier immediately following its corresponding reference-modifying 
instruction. Instead, the compiler may defer emitting the write-barrier code until a subse- 
quent location in the mutator code. To that end, the compiler may maintain a list of ref- 
erence-modifying instructions whose write barriers have been deferred. At some later 
point in the mutator code, the compiler may emit write barriers based on information 
stored in the list. Notably, the deferred write-barrier code may be emitted in accordance 
with various write-barrier implementations, such as sequential store buffers, precise card- 
marking, imprecise card-marking, and so forth. 
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Fig. 8 is an exemplary listing of pseudo-code 800 for a mutator method m. 
Among the method's instructions are reference-modifying instructions 810, 820, and 830 
(labeled A, B, C, respectively). A compiler that compiles the exemplary method m for a 
garbage-collected system would conventionally emit write-barrier code immediately after 
emitting each reference-modifying instruction. That is, a write barrier 840 would con- 
ventionally be emitted immediately after instruction 810, a write barrier 850 would be 
emitted immediately after instruction 820 and a write barrier 860 would be emitted im- 
mediately after instruction 830. However, as shown in Fig. 8, the compiler in the illus- 
trative embodiment does not immediately emit those write barriers after their corre- 
sponding reference-modifying mutator instructions. Rather, the compiler defers emission 
of the write barriers 840, 850, and 860 until a later point in the mutator code. 

In general, the compiler converts source-code or byte-code representations of the 
mutator method m into machine-level instructions that may be executed by a processor. 
The compiler typically performs one or more data-flow analyses as it converts the re- 
ceived mutator code into machine-level instructions. As used herein, the mutator's data 
flow defines the logical progression that the compiled mutator instructions will be exe- 
cuted by the processor. Thus, the compiler's data-flow analyses may be used to deter- 
mine an optimized sequence of instructions which may be emitted into the compiled mu- 
tator code. To that end, the compiler may perform one or more "passes" through the 
mutator code in order to analyze the code's data flow. At each pass, the compiler often 
"tracks" state information related to values and expressions that it manipulates when gen- 
erating the optimized sequence of mutator instructions. This state information can be 
propagated among the various compilation passes to facilitate generation of the compiled 
mutator code. 

Value numbers provide one mechanism the compiler may employ for tracking 
values and expressions in the compiled mutator code. The compiler assigns a unique 
value number to each abstract value or expression that it manipulates while compiling the 
mutator code. For example, different value numbers may be assigned to objects, arrays, 
reference values, etc. allocated by instructions in the mutator method m. In some cases, 
the compiler can identify a specific reference value in the heap solely based on the refer- 

19 

H:\I 12\047\0064\PROSECUT\0064.doc 10/03/03 2:13 PM 



PATENT 
P7863/1 12047-0064 

ence value's associated value number. However, more generally, the compiler identifies 
a reference value based on a value number in combination with other information. For 
instance, if a first value number is assigned to an object allocated in the method m, then 
the combination of the first value number with an offset value can specify a particular 
object-reference field in the object. Similarly, a second value number assigned to a refer- 
ence array may be combined with a mathematical expression specifying a particular ref- 
erence-array element. Value numbers and their uses are described in more detail in 
Chapter 12 of Advanced Compiler Design and Implementation, by Steven Muchnick, 
published 1997, which is hereby incorporated by reference as though fully set forth 
herein. 

Although value numbers are symbolic representations of values and expressions 
manipulated by the compiler, the compiler does not emit mutator instructions in terms of 
its internal value numbers. Instead, the compiler converts the value numbers to equiva- 
lent run-time expressions that may be incorporated in the compiled mutator code. For 
instance, the compiler may maintain a table that associates value numbers of objects and 
arrays with the specific registers and/or stack-frame slots that store run-time memory lo- 
cations of those objects and arrays. This table may be dynamically updated during the 
compilation process as the memory locations of objects and arrays are transferred among 
different registers and stack-frame slots. 

For example, suppose an instruction in the mutator method m allocates an object o 
and the compiler assigns the object a value number equal to valnum_o. Further suppose 
that the instruction stores the memory location of the object o in the register register jo. 
In this case, the compiler refers to the object o as valnum o in its (internal) intermediate 
representations of the compiled mutator code, even though the compiler explicitly emits 
mutator instructions for the object o in terms of register j). Next, assume that at some 
later point in the mutator code the compiler emits instructions that copy the object o's 
memory location out of the register j) and into a stack-frame slot slotj). Subsequently, 
the compiler continues to reference the object during its compilation processes as val- 
num_o, although compiled instructions referring to the object are now emitted in terms of 
sloto (rather than in terms of register jo). 
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Fig. 9 illustrates a table 900 that may be used by the compiler to store a list of 
memory locations where Fig. 8's reference-modifying instructions A, B and C modify 
references in the heap. The list stored in table 900 includes entries 910, 920 and 930 in- 
dicating value-number expressions that respectively correspond to the reference values 
modified by the mutator instructions A, B and C. Illustratively, the entry 910 stores a 
value number and offset pair that collectively identify a reference value modified by the 
mutator instruction 810. The entry 920 stores a value number corresponding to the refer- 
ence value modified by the instruction 820. And the entry 930 stores a value number and 
a mathematical expression that collectively identify the reference value modified by the 
instruction 830. Advantageously, the compiler emitting Fig. 8's mutator code dynami- 
cally updates the table 900 as it compiles the mutator code. 

Based on the table entries 910, 920 and 930, the compiler can emit the deferred 
write barriers 840-860, e.g., at the end of the method m. More specifically, the compiler 
uses the value-number expressions stored in the entries 910-930 to identify which refer- 
ence values are modified in the mutator method m without emission of corresponding 
write-barrier code. For each value-number expression, the compiler first identifies a 
memory location associated with the object, array, etc. whose value number is stored in 
the value-number expression. To that end, the compiler may maintain a table (not 
shown) that maps the value number to a particular register or stack-frame slot accessible 
in the run-time system. The compiler then combines the value number's associated 
memory location (e.g., register or slot) with the remaining information in the value- 
number expression to generate and emit the appropriate write-barrier code 840-860. 

For purposes of discussion, the deferred write-barrier code is emitted at the end of 
the method m 800. However, it is expressly contemplated that the compiler may emit the 
deferred write-barrier code at other locations in the emitted mutator code as well. That is, 
as more write barriers are deferred in the method 800, the compiler's table 900 may 
"grow" unreasonably large and consume an excessive amount of memory resources. 
Therefore, the compiler may periodically emit some (or all) of its deferred write-barrier 
code at locations prior to the last mutator instruction so as to conserve resources, such as 
memory usage, during its compilation process. As described in section II below, such 
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periodic "winnowing" of entries in the table 900 enables the compiler to generate mutator 
code having certain run-time advantages as well 

Advantageously, deferring write-barrier emission at compile time, as illustrated in 
Figs. 8 and 9, enables the compiler to eliminate unnecessary or redundant write-barrier 
code from the compiled mutator code, regardless of which type of write-barrier imple- 
mentation is employed (e.g., sequential store buffers, precise card marking, imprecise 
card marking, etc.). As a result, the mutator may contain less write-barrier overhead and 
consequently execute faster and more efficiently at run time. Furthermore, by collocating 
deferred write barriers in the mutator code, the compiled mutator code can also exhibit 
improved cache performance at run time. 

B. Combining write barriers in compiled code 

(i) Emitting a single write barrier for multiple reference modifications to the 
same memory location 

When write-barrier emission has been deferred for multiple reference-modifying 
mutator instructions, the compiler may be able to determine that two or more of the de- 
ferred write barriers correspond to reference modifications made to the same memory lo- 
cation. In such a case, the compiler may be able to reduce the number of deferred write 
barriers emitted in the mutator code. Suppose, for example, that Fig. 8's instructions 810, 
820, and 830 take the form of Fig. 10's reference-modifying instructions 1010, 1020, and 
1030. For instance, instruction 1010 (a/ = a;) modifies a field /in an object o, instruc- 
tion 1020 (o.f= b;) modifies the same field fin the object o and instruction 1030 (xf= c;) 
modifies a field /in a different object x. 

Suppose further that, in deferring write-barrier emission for those reference modi- 
fications, the compiler has kept a list, e.g., in Fig. 1 1 's table 1 100, of where Fig. 10's ref- 
erence-modifying instructions modify references in the heap without a corresponding 
write barrier. The table 1 1 00 therefore contains three entries 1 1 1 0, 1 1 20 and 1 1 30 re- 
spectively corresponding to the reference-modifying instructions 1010, 1020 and 1030. 
In this example, the value numbers valnum_o and valnum x correspond to the memory 
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locations of the objects o and x, respectively, and the offset value offset J stores a relative 
offset of an object field / 

Because the value number and offset pairs stored in table entries 1110 and 1 120 
are identical, the compiler can conclude that the entries 1110 and 1 120 correspond to ref- 
erence-modifying instructions (e.g., instructions 1010 and 1020) modifying the same 
memory location. Consequently, the entries would result in the compiler generating 
identical deferred write-barrier code. In accordance with the illustrative embodiment, the 
compiler may remove one of the redundant entries 1 1 10 or 1 120 from the table 1 100, 
thereby resulting in a condensed table 1 1 50. Since the resultant table 1 1 50 contains only 
two entries 1 160 and 1 170 corresponding to deferred write barriers, the compiler only 
emits two deferred write barriers 1040 and 1050, e.g., at the end of the mutator code 
1000. 

(ii) Combining deferred write barriers that correspond to reference 
modifications in the same object or card 

As noted, the compiler may construct and manage a table with which it can gen- 
erate and emit deferred write-barrier code at a predetermined point in a mutator code. 
While the compiler may elide identical table entries (as described in Figs. 10 and 11), the 
compiler may also remove or combine table entries that may not be identical, but would 
generate the same write-barrier code. For example, two entries may correspond to refer- 
ence-modifying mutator instructions that modify reference fields located in the same 
card. Assuming the garbage collector is configured in a precise card-marking scheme, 
the two entries provide the collector with the same information and therefore may be 
combined into a single table entry. Similarly, in an imprecise card-marking scheme, en- 
tries corresponding to reference modifications in the same object provide the collector 
with the same information and thus may be combined into a single entry. 

In practice, the compiler may determine two or more table entries correspond to 
reference modifications made in the same card based on a known alignment of objects or 
arrays within the card. For instance, Fig. 12 illustrates two cards 1200 and 1210 in which 
objects are aligned along double-word boundaries 1220. An exemplary object o (1230) 
begins along one of the double- word boundaries in card 1200. The object comprises a 

23 

H:\l 12\047\0064\PROSECU-n0064.doc 10/03/03 2:13 PM 



PATENT 
P7863/1 12047-0064 

plurality of word-sized reference fields, such as adjacent fields / and g (1240 and 1250). 
Those skilled in the art will understand that the object 1230 is illustrated for exemplary 
purposes, and the technique of the present invention equally applies to objects and arrays 
having other known alignments and different object-field sizes than those depicted. 

Fig. 13 illustrates pseudo-code of an exemplary method m 1300 comprising in- 
structions 1310 and 1320 for modifying reference values stored in the object o 1230. The 
reference-modifying instruction 1310 modifies the field / (1240) located at an offset of 2 
words from the beginning of the object; the instruction 1320 modifies the field g (1250) 
located at an offset of 3 words in the object. For the sake of explanation, it is assumed a 
and b are reference values that may be stored in an object's reference field, and the com- 
piler has emitted mutator instructions that respectively store the reference values in the 
registers register a and register _b. Furthermore, assume the compiler assigns the object 
o to a value number equal to valnum_o. 

Fig. 14 depicts a table 1400 having entries 1410 and 1420 respectively corre- 
sponding to the instructions 1310 and 1320. Each of the entries stores at least enough 
information for the compiler to generate and emit a write barrier, e.g., in accordance with 
a precise card-marking scheme. Specifically, each entry stores a pair of value number 
and offset values identifying a respective object-reference field in the object 1230 that is 
modified by instructions in the method 1300 without emission of write-barrier code. For 
example, the entry 1410 stores the object's value number valnumj) and an offset value 
equal to 2, e.g., measured in units of words, corresponding to the object-reference field / 
(1240) modified by the instruction 1310. Entry 1420 similarly stores a pair of values 
"valnum_o, 3" corresponding to the reference field g (1250) modified by the instruction 
1320. 

The compiler may combine entries in the table 1400 that generate the same write- 
barrier code. In a precise card-marking scheme, the compiler may determine entries in 
the table 1400 generate the same write-barrier code if they correspond to reference modi- 
fications made in the same card. Assuming the compiler is aware that object 1230 is lo- 
cated in a region of memory where objects are aligned along double-word boundaries, the 
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compiler can determine the entries 1410 and 1420 correspond to reference modifications 
located in the same double-word and therefore in the same card. 

The compiler may therefore create a new, condensed table 1450 that either re- 
moves one of the entries 1410 or 1420 or combines them into a single entry. For pur- 
poses of discussion, the compiler in this example removes the entry 1420. Based on the 
contents of the remaining entry in the condensed table 1450, the compiler generates and 
emits only a single deferred write barrier 1330 at the end of the method 1300, even 
though write barriers were initially deferred for two reference-modifying instructions 
1 3 1 0 and 1 320 in the mutator code. 

(iii) Combining deferred write barriers corresponding to reference 
modifications made within a known range of memory addresses 

When the compiler identifies first and second reference modifications made 
within a predetermined distance, such as a card-size, from each other, the compiler 
sometimes may elide deferred write barriers which correspond to other reference modifi- 
cations made in-between the first and second modifications. For example, in a precise 
card-marking scheme, two reference modifications made within a card-length from each 
other must have occurred in the same or adjacent cards. Therefore, if the compiler emits 
a write barrier for each of these two modifications, write-barrier code corresponding to 
reference modifications made in-between the two modifications can not provide new in- 
formation to the garbage collector. 

Fig. 15 illustrates two cards 1510 and 1520, each card having a card-size C, which 
may be measured in units of bytes, words, double-words, etc. An exemplary object o 
(1530) begins on card 1510 and spans both cards. The object comprises a plurality of 
reference fields, among which are field x (1540) and fields (1560) located in card 1510, 
and field z (1550) located in card 1520. 

Fig. 16 illustrates pseudo-code for an exemplary method m 1600 having refer- 
ence-modifying instructions 1610, 1620 and 1630 that modify object 1530's reference 
fields x, z and y respectively. Here, it is assumed that the compiler has emitted instruc- 
tions to store the reference values a, b and c respectively in the registers register jl, reg- 
ister _b and register _c. Further, it is assumed that the compiler has emitted a mutator in- 

25 

H:\l 1 2\047\0064\PROSECUT\0064.doc 1 0/03/03 2: 1 3 PM 



PATENT 
P7863/1 12047-0064 

struction for storing the memory location of the object 1530 in the register register jd. 
The compiler may generate a data structure, such as Fig. 17's table 1700, to record in- 
formation that the compiler may later use to generate deferred write-barrier code in the 
method 1600. The table 1700 comprises entries 1710, 1720 and 1730 corresponding to 
the reference-modifying instructions 1610, 1620 and 1630. As shown for a precise card- 
marking implementation, each entry stores the value number valnum o assigned to the 
object 1530 and an offset value corresponding to a modified reference field in the object 
1530. 

For purposes of explanation, assume the compiler determines the distance be- 
tween field x (1540) and field z (1550) is less than or equal to the card size C, e.g., based 
on the information stored in table entries 1710 and 1720. The compiler may therefore 
determine the modification to field j> (1560), which is located in-between fields x and z, 
corresponds to the same write-barrier code as either the modification to field x or the 
modification to field z, or both. For this reason, the compiler may eliminate the entry 
1730 from table 1700, and create a new, condensed table 1750 having entries 1760 and 
1770. Entry 1760 comprises the same information as 1710, and entry 1770 comprises the 
same information as 1720. The compiler relies on entries in the condensed table 1750 to 
generate the write-barrier code 1640 and 1650. Accordingly, the compiler emits write 
barriers 1640 and 1650, e.g., at the end of the mutator code, using information stored in 
the entries 1760 and 1770. In this manner, the redundant write-barrier code correspond- 
ing to the reference modification made in fieldy is not emitted into the compiled mutator 
code. 

Similarly, if the method 1600 included other reference-modifying instructions in 
addition to those shown, the compiler may elide any other table entries corresponding to 
reference modifications made in-between the fields x and z. Further, when a plurality of 
reference modifications are all made within a range of memory addresses spanning a 
distance less than or equal to a card-length, the compiler may first identify, from the plu- 
rality of modified fields, the two fields located farthest apart and write barriers for modi- 
fications made in-between these two outer-most fields can be elided, as described above. 
Those skilled in the art will appreciate that combining deferred write barriers for refer- 
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ence-writes made across several cards also may be handled in a similar fashion as that 
described above. 

C. Combining deferred write barriers in extended basic blocks 

Up to this point, the techniques of the present invention have been discussed in 
regards to combining or eliding deferred write barriers for a sequence of compiled muta- 
tor instructions in a basic block. As used herein, a basic block is defined as a linear se- 
quence of instructions terminating with a call or branch instruction, such that no instruc- 
tion in the basic block other than the first is a target of a branch or call instruction. Thus, 
a basic block has a single entry point and a single exit point. For example, methods 800, 
1000, 1300 and 1600 illustrate exemplary mutator methods implemented as basic blocks. 
However, the data flow of mutator code is often more complex than a single basic block. 

For instance, the mutator may comprise a plurality of basic blocks that are inter- 
connected to form an extended basic block. As used herein, an extended basic block is an 
arrangement of basic blocks having a single entry point and possibly multiple exit points. 
An extended basic block therefore may be thought of as a hierarchical arrangement of 
basic block nodes having no "joins" or "loops" among the nodes. More generally, ex- 
tended basic blocks are described in more detail in Chapter 7 of the reference book enti- 
tled Advanced Compiler Design and Implementation, by Steven Muchnick, published 
1997, which is hereby incorporated by reference as though fully set forth herein. 

In accordance with the illustrative embodiment, the compiler may defer emission 
of write barriers across one or more of the basic blocks in an extended basic block. For 
instance, Fig. 18 illustrates an extended basic block 1800 which contains a plurality of 
interconnected basic blocks (e.g., methods) 1810, 1820, 1830 and 1840. Here, the illus- 
trated extended basic block 1800 has a single entry point into the basic block 1810 and 
two possible exit points out of basic blocks 1830 and 1 840. Each of the basic blocks 
1810-1840 contains a linear sequence of mutator instructions. 

Advantageously, for every basic block in the extended basic block 1800, the com- 
piler may generate a corresponding list indicating where references are modified in the 
heap by mutator instructions whose write barriers are deferred before the mutator's data 
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flow enters the basic block. For example, when the mutator's data flow exits the basic 
block 1810 and continues at the start of the basic block 1820, only one write barrier is 
deferred, e.g., for the reference-modifying instruction A, in this "branch" of the mutator's 
data flow. Consequently, the compiler may generate a table 1850 containing a single en- 
try that indicates that the instruction 1810's deferred write barrier is "inherited" by the 
basic block 1820. By way of example, the table 1850 stores a pair of value number and 
offset values that collectively identify to the compiler where the reference-modifying in- 
struction A modified a reference in the heap. In this example, the compiler can also 
identify the memory locations of each of the other reference-modifying instructions B-E 
in the extended basic block 1800 by a corresponding pair of value number and offset val- 
ues. 

Because the basic block 1820 contains one reference-modifying instruction C, the 
list of deferred write barriers is augmented, as shown by the two entries in the table 1860, 
and passed to the next basic block 1830 in this branch of the extended basic block's data 
flow. The compiler may then emit the branch's deferred write barriers 1870, including 
any write barriers that are deferred in the basic block 1830, at a predetermined point in 
the extended basic block, such as at the end of the basic block 1830. Further to the illus- 
trative embodiment, the compiler may combine or elide table entries, as previously de- 
scribed in sections B(i)-(iii) of this disclosure, to reduce the number of deferred write bar- 
riers emitted at the end of the basic block 1 830. 

Here, it is noted that write-barrier code deferred in a first basic block may be in- 
herited by a second basic block only for reference modifications made to memory loca- 
tions that are stored in register and/or stack-frame slots available in both the first and sec- 
ond basic blocks. For example, if the reference-modifying instruction C in basic block 
1820 modifies a reference value stored in an object whose memory location is accessible 
only in the block 1820, then the deferred write barrier for this reference modification may 
not be inherited by the basic block 1 830. Instead, the deferred write barrier for this refer- 
ence modification may be emitted at a predetermined point in the block 1820, e.g., at the 
end of the basic block 1 820. 
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In the other branch of the extended basic block, the mutator's data flow passes 
from the end of the basic block 1 8 1 0 to the beginning of the basic block 1 840. In this 
branch, the compiler defers emission of write-barrier code for the reference-modifying 
instructions A and B in the basic block 1810. Thus, the compiler may generate a table 
1880 indicating that these write barriers are deferred before the mutator's data flow 
passes from the basic block 1810 to the basic block 1840. Accordingly, the compiler 
may emit the deferred write-barrier code 1890, corresponding to reference-modifying in- 
structions in the basic blocks 1810 and 1840, at a predetermined point in the basic block 
1840. The compiler may combine and elide table entries to reduce the number of deferred 
write barriers 1890, as previously described herein. Again, it is noted that write-barrier 
code for the reference-modifying instructions A and B may only be deferred from the ba- 
sic block 1810 to the basic block 1840 when their modified reference locations are stored 
in registers and/or stack-frame slots available in both basic blocks. Accordingly, the 
compiler may emit deferred write-barrier code for modified reference locations that are 
only accessible in the basic block 1810 at a predetermined point in the block 1810. 

The compiler may employ a similar technique for deferring write barriers in mu- 
tator data flows having one or more "join" nodes. Fig. 19 illustrates a mutator 1900's 
flow diagram having a plurality of basic blocks 1910, 1920, 1930 and 1940. Although 
the mutator's execution has a single entry point (at the beginning of 1910) and a single 
exit point (at the end of 1940), the mutator's data flow can pass through two different 
"branches" that join at the beginning of basic block 1940. Like Fig. 18's extended basic 
block 1800, the compiler may generate tables 1950, 1960, 1970 and 1980 to indicate 
which deferred write barriers are inherited by each of the mutator's basic blocks. For in- 
stance, the table 1950 indicates that emission of the write barrier for reference-write A is 
deferred before the mutator reaches the entry point to basic block 1920. Similarly, the 
table 1960 augments the previous table to indicate that emission of the write barriers cor- 
responding to reference-writes A and C are deferred before the mutator's data flow enters 
basic block 1940. 

Looking at the other branch in the mutator's data flow, the table 1970 indicates 
that emission of the write barrier for the reference-writes A and B are deferred before the 
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mutator's data flow reaches the entry point to basic block 1930. The table 1980 indicates 
the write barriers corresponding to reference-writes A, B and D are deferred before the 
mutator's data flow enters basic block 1940. Since the compiler has access to more than 
one list of deferred write barriers at the join node (1940), the compiler may concatenate 
the lists and combine and elide write barriers, as previously described by the techniques 
set forth herein in sections B(i)-(iii). The resulting deferred write barriers 1990, e.g., 
which were not determined to provide redundant or unnecessary information to the col- 
lector, are emitted at a predetermined point in the mutator, such as at the end of the basic 
block 1940. In this example, it is assumed that the reference values modified by the ref- 
erence-write instructions A-D are stored in registers and/or slots available in the basic 
block 1940, so the compiler can emit the deferred write-barrier code 1990. 

D. Combining deferred write barriers in the presence of calls 

In accordance with an illustrative embodiment, write-barrier code may be de- 
ferred across conventional "call" instructions emitted in the mutator code. As understood 
in the art, a call instruction in a first extended basic block directs the mutator's data flow 
to a predetermined instruction, e.g., the first instruction, in a second extended basic block. 
Typically, the second extended basic block includes a "return" instruction, e.g., its last 
instruction, that redirects the mutator's data flow to the next logical instruction in the first 
extended basic block following the call instruction. 

When the compiler defers emission of write-barrier code "across" a call instruc- 
tion, the deferred write barriers may correspond to reference-modifying instructions in 
the first and second extended basic blocks that were emitted without corresponding write- 
barrier code and whose modified memory locations are available in both the first and 
second extended basic blocks. For example, write-barrier actions may be deferred across 
the call for reference modifications made to objects passed as arguments to the call (i.e., 
the receiver object, "this"). Similarly, write barriers may be deferred for reference modi- 
fications made in any objects returned to the first extended basic block as a result of the 
call to the second extended basic block. Advantageously, the compiler may combine or 
elide the write barriers deferred across the call instruction and emit the remaining, un- 
combined write barriers at a predetermined location in the first extended basic block. 
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Fig. 20 illustrates an exemplary call instruction 2030 in a first method m 2000 that 
initiates execution of a second method p 2080. For reasons of simplicity, only one call 
instruction is shown in Fig. 20, although those skilled in the art will appreciate that mul- 
tiple calls may span more than two methods. The method m includes reference- 
modifying instructions 2010 and 2020 (labeled A and C) as well as the call instruction 
2030 to the method p. The method p includes reference-modifying instruction 2040 (la- 
beled B). The mutator's data flow starts with the first instruction in the method m and 
progresses until it reaches the instruction 2030, at which point it is directed to the first 
instruction in the method p. At the last instruction 2090 in method p, the mutator's data 
flow returns to the next logical instruction following the call instruction 2030 in the 
method m. 

In accordance with the illustrative embodiment, emission of write-barrier code in 
the method p may be deferred across the call until a predetermined location, e.g., the last 
instruction, in the method m. To that end, the compiler maintains a table 2005 that rec- 
ords where the mutator instructions modify references in the method m prior to the call 
instruction 2030 without emission of corresponding write-barrier code. For instance, the 
table 2005 includes a pair of value number and offset values that correspond to the mem- 
ory location of a reference that is modified by the instruction 2010 without corresponding 
write-barrier code. The compiler adds a new entry to the table 2005 for the reference- 
modifying instruction 2040 that is emitted in the method p without corresponding write- 
barrier code. Thus, the resultant table 2085 includes entries for the instructions 2010 and 
2040 whose write barriers are deferred. Lastly, entries are added to the table 2085 for 
any instructions, such as the instruction 2020, that are emitted without corresponding 
write-barrier code and located after the call instruction 2030 but before the deferred write 
barriers 2050, 2060 and 2070 are emitted in the method m. Advantageously, the compiler 
can combine or elide entries in the table 2085, as described in section B(i)-(iii), before 
emitting their corresponding deferred write-barrier code. 

E. Emitting deferred write barriers when registers' contents are spilled 

In general, compiled mutator code sometimes includes instructions for storing 
more values, such as memory reference values, than there are available hardware regis- 
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ters to store those values. When this occurs, the compiler may emit instructions to copy 
("spill") data out of one or more hardware registers so the registers can be reused to store 
new values. Accordingly, the compiler may emit instructions into the mutator code to 
copy the contents of a spilled register into a predetermined memory location, such as a 
stack-frame slot, associated with the compiled mutator code. 

Fig. 21 illustrates an exemplary sequence of compiled mutator code, i.e., method 
m 2100. In this example, assume four hardware registers are available to the compiler for 
storing the memory locations of objects instantiated by instructions in the method m. The 
method comprises instructions 21 10, 2120, 2130 and 2140 that instantiate four objects 
(e.g., objects a, b, c and d) such that the memory location of each object is stored in one 
of the available hardware registers. The method also includes reference-modifying in- 
structions 2150, 2160 and 21 80 that respectively modify a reference field x in the object a 
(ax=q;), a reference field x in the object b (b.x=r;) and a reference field in the object b 
(b.y=p;). Here, it is assumed the values p, q and r are reference values that may be stored 
in an object reference field, and the memory locations of the objects a and b are respec- 
tively stored in the registers register a and register _b. Although object-reference fields 
are modified in this example, those skilled in the art will appreciate that the instructions 
2150, 2160 and 2180 may alternatively modify other reference values, e.g., array- 
reference values, etc. 

According to embodiments described herein, when write barriers are deferred for 
reference-modifying instructions in a method, the compiler may create a data structure, 
such as a table, that stores information enabling the compiler to later generate the de- 
ferred write-barrier code. Fig. 22 illustrates a table 2200 in which the compiler records 
information used to generate deferred write barriers for the method 2100. After the com- 
piler emits the instruction 2160, the table comprises two entries 2210 and 2220 corre- 
sponding to the reference-modifying instructions 2150 and 2160. As shown for a precise 
card-marking scheme, each of the entries stores a value number assigned to an object and 
an offset of a reference field in the object. Thus, the entry 2210 stores the value number 
assigned to the object a (i.e., valnumji) and the offset of the reference field x (i.e., off- 
set jc) modified by the instruction 2150. Likewise, the entry 2220 stores the value num- 
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ber assigned to the object b (i.e., valnumjb) and the offset of the reference field x (i.e., 
offset jc) modified by the instruction 2160. Those skilled in the art will understand that 
when different write-barrier schemes are implemented, the information stored in entries 
of the illustrative table 2200 may include alternate or additional information. 

At some point after the compiler emits the instruction 2160, the compiler emits 
the instructions 2165 and 2170 that instantiate two new objects e and / When the com- 
piler reaches these instructions, each of the four available registers is already committed 
to storing one of the memory locations of objects a-d, so the mutator must reuse two of 
the registers to ensure that there are enough registers available to store the newly instanti- 
ated objects' memory locations. To that end, the compiler may emit instructions to copy 
("spill") the contents of two hardware registers, e.g., into a stack frame associated with 
the method 2100, thereby "saving" the contents of the spilt registers and "freeing" the 
registers so they may be reused to store the memory locations of the objects e and / Al- 
ternatively, if the compiler determines that the contents of one or more of the registers 
will not subsequently be used by instructions in the method 2100, then the compiler need 
not spill the registers' contents before emitting the instructions 2165 and 2170. That is, if 
the registers' contents are not subsequently used, they may be safely overwritten by the 
instructions 2165 and 2170. However, in this case, before the compiler emits the instruc- 
tions 2165 and 2170, the compiler emits any deferred write-barrier code associated with 
the registers whose contents will be overwritten. 

Operationally, when the compiler emits an instruction to reuse a hardware regis- 
ter, the compiler may have to adjust the data structure, e.g., table 2200, with which it rec- 
ords where references are modified by mutator instructions without emission of corre- 
sponding write-barrier code. The compiler may determine whether the contents of the 
register is associated with a value number that (i) equals the value number assigned to the 
contents of another register, (ii) equals the value number assigned to the contents of a 
stack-frame slot or (iii) can be recomputed from other available value numbers. If any of 
these conditions is satisfied, the compiler does not modify the table 2200. Further, if the 
compiler spills the contents of the register, e.g., to a stack-frame slot, then the compiler 
can associate the value number of the spilled contents with the stack-frame slot instead of 
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the register, and the table 2200 may remain unchanged. However, if the compiler deter- 
mines that the contents of the reused register will not be subsequently used in the mutator 
code, the compiler may emit deferred write-barrier code associated with the register's 
value number, combining and eliding the deferred write-barrier code as appropriate. As a 
result, the compiler "winnows" (i.e., removes) entries in the table 2200 corresponding to 
the emitted deferred write-barrier code. 

For example, suppose the compiler chooses to reuse the registers register a and 
register _b in order to store the memory locations of the objects e and /instantiated by the 
instructions 2 1 65 and 2 1 70. The compiler may assign new value numbers to the reused 
registers register a and register Jy, e.g., changing their value numbers from valnum_a 
and valnumjb to valnum_e and valnumjl respectively. However, before the instructions 
2165 and 2170 are emitted and the new value numbers are assigned to the registers reg- 
ister _a and register _b, the compiler may be configured to determine whether the objects 
a and b are associated with any deferred write barriers recorded in the table 2200. In this 
case, the table entries 2210 and 2220 indicate that the objects' respective value numbers 
valnum_a and valnumjb are each associated with a deferred write barrier. 

Because no subsequent mutator instructions modify references in the object a, the 
compiler may emit the deferred write barriers for the object a, e.g., based on the contents 
of the entry 2210, before the instructions 2165 and 2170 are emitted into the mutator 
code. Broadly stated, the compiler may combine and elide entries in the table 2200, 
where possible, that contain the value number valnum a, then the compiler emits write- 
barrier code corresponding to any remaining table entries containing valnumji. Since 
the table 2200 comprises only a single entry 2210 storing the value number valnum_a, 
the compiler generates and emits one deferred write barrier 2175 based on the contents of 
the entry 2210. Upon emitting the deferred write barrier, the compiler removes the entry 
from the table and creates a new, condensed table 2230. 

In contrast, the compiler determines that another reference modification will be 
made to the object b, e.g., at instruction 2180, after the instructions 2160 and 2175 are 
emitted. Therefore, the compiler may be configured to continue deferring write barriers 
for the object b. To that end, the compiler emits an instruction 2195 in the mutator code 
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to spill the contents of the register _b into a stack-frame slot slot J) associated with the 
method m 2100. After emitting the instruction to spill the contents of the register regis- 
ter Jb into the stack-frame slot slot J), the compiler then associates the object b's assigned 
value number valnumjb with slot J), rather than with register Jb. In this manner, the 
compiler can continue to record object b's deferred write barrier in terms of its value 
number valnum b, even though the memory location of the object b has been spilled 
from the register register Jb. To continue deferring write-barrier code associated with the 
object b, the compiler copies the contents of the table entry 2220 into the condensed table 
2230. 

After the compiler emits the instruction 2180, a new entry 2250 is added to the 
condensed table 2230 corresponding to the object-reference field in the object b (i.e., lo- 
cates at an offset offset J) whose value is modified by the instruction 21 80 without a cor- 
responding write barrier. When the compiler reaches a predetermined point in the muta- 
tor code, such as the last instruction in method 2100, the compiler combines and elides 
the remaining entries 2240 and 2250, if possible. As shown, it is assumed the entries 
cannot be combined or elided and the compiler emits the deferred write barriers 2185 and 
2190 based on these entries. 

F. Super objects 

Grouping a plurality of instantiated objects into logical "super objects" may re- 
duce the number of times registers' contents are spilled in a method. As used herein, a 
super object comprises a plurality of individual objects located adjacent or at known off- 
sets from one another in memory. Since a compiler configured to group objects into su- 
per objects may store a single memory address for the plurality of objects, the compiler is 
less likely to run out of available registers than a compiler that is not configured to use 
super objects. Thus, super-objects may prevent a compiler from having to emit excessive 
amounts of "spilling" instructions into the mutator code, thereby enabling the mutator 
code to execute more efficiently at run time. 

G. Emitting guard-code instructions for one or more deferred write barriers 
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In accordance with an illustrative embodiment, the compiler may record informa- 
tion in a data structure, such as a table, to identify where in the heap references are modi- 
fied by reference-modifying mutator instructions whose write barriers are deferred. Fur- 
ther, the compiler may examine the information stored in the data structure to prevent 
itself from emitting write-barrier code corresponding to reference modifications made in 
areas of memory where the garbage collector does not rely on write barriers, such as in 
areas that are, e.g., always or never garbage collected. 

However, there may be times when the compiler can not determine whether the 
modified reference will be located in an area of memory where the garbage collector does 
not rely on write barriers. For example, a reference modification may be made to an ob- 
ject whose location is not known within the scope of a compiled method. When one or 
more reference modifications are made to an object whose location in memory can not be 
determined at compile time, the compiler may emit a sequence of one or more instruc- 
tions for performing one or more "guarding" tests that together determine whether the 
object is located in a memory region where the collector relies on write barriers. Such a 
sequence of instructions will be referred to herein as guard instructions. If the object re- 
sides in an area of memory where the collector does not rely on write barriers, then the 
guard instructions prevent the mutator, at run time, from executing write-barrier code cor- 
responding to reference modifications made in the object. 

Fig. 23 illustrates an exemplary method 2300 having reference-modifying in- 
structions 23 10, 2320 and 2330, which respectively modify the reference fields /, g and h 
in an object o. For purposes of explanation, assume the memory location of the object o 
is stored in the register register _o, and the values a, b and c are reference values. The 
compiler defers emitting write-barrier code 2350 for the instructions 2310-2330 until it 
reaches the end of the method. A data structure, such as Fig. 24' s table 2400, may be 
used by the compiler to store information that enables the compiler to generate the de- 
ferred write barriers. As shown, the table 2400 comprises entries 2410, 2420 and 2430 
that identify a value number valnumj) assigned to the object o. The entries also store the 
relative offsets of reference fields, within the object o, that are modified by the instruc- 
tions 2310-2330. The table's entries may be combined or elided using the techniques set 
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forth herein. The remaining, uncombined entries may be used by the compiler to emit the 
entries' corresponding deferred write barriers, which are preferably emitted sequentially 
at a predetermined location in the mutator code. 

Because each of the entries 2410-2430 corresponds to a reference modification 
made in the same object, the compiler emits guard instructions 2340 that determine 
whether the object resides in an area of memory where the garbage collector relies on 
write barriers. In the example shown, it is assumed that the collector does not rely on 
write barriers for objects located at a memory address less than the memory-address 
value stored in register _L Specifically, the first instruction in the guard code subtracts 
the memory address of object o from the value stored in register 1 and sets a "condition 
code" (i.e., a status bit) if the result is negative. If the result of the subtraction does not 
set the negative condition code, the guard code determines the object o is located in an 
area of memory where the collector relies on write barriers and therefore directs the mu- 
tator' s data flow to the deferred write barriers 2350. On the other hand, if the negative 
condition code is set, the guard code directs the mutator's data flow to the end of the 
mutator code, thereby by-passing the deferred write barriers 2350. As a consequence, the 
amount of write barrier overhead executed by the mutator at run time may be reduced. 

While the guard code 2340 compares the memory address of the object a to a 
known value (e.g., stored in register _1) to determine whether the object is located in an 
area of memory where the collector relies on write barriers, those skilled in the art will 
appreciate the guard code could implement other tests to discern the same information. 
For instance, the test could have been based on a flag or field value stored in object o, a 
header pattern or value stored in object o's near-class, the memory address of object o's 
near-class, and so forth. The use of guard code is also generally described in commonly 
assigned application Serial No. [Attorney Docket Number 1 12047-0083], entitled Spe- 
cializing Write Barriers for Objects in a Garbage Collector Supporting Copying Collec- 
tion by Alexander T. Garthwaite et al. which is expressly incorporated herein by refer- 
ence as though fully set forth herein. Notably, we may further simplify the implementa- 
tion described in the above-noted patent application since there is no need for nr_maps 
when the guard code "protects" collocated deferred write barriers. 
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II. GARBAGE-COLLECTING MUTATOR CODE CONTAINING DEFERRED 
WRITE-BARRIER CODE 

A. Combining deferred write barriers during a garbage collection interval 

At run time, garbage-collection intervals are usually performed at logical stopping 
points or safe points in the mutator code, where the compiler has provided stack maps by 
which the collector can locate references in the heap. Possible safe points are generally 
chosen by the compiler based on points in the mutator code where stack maps can be 
easily created, such as at backwards-branching instructions, call instructions, instructions 
that generate exceptions, instructions that begin loops, etc. As used herein, a stack map is 
a data structure created by the compiler for recording where reference values are located 
in the call stack and registers. Therefore, a stack map associated with a possible safe 
point indicates which registers and stack-frame slots store reference values (i.e., including 
a root set) at that possible safe point. 

While the stack map informs the garbage collector where reference values are lo- 
cated at a possible safe point, the collector relies on other mechanisms, such as write bar- 
riers, to identify which of those references values have been modified or may have been 
modified since the last collection interval. Operationally, the mutator sequentially exe- 
cutes machine-level instructions emitted by the compiler, among which are write-barrier 
instructions that communicate to the collector where in the heap references have been 
modified. For instance, the executed write barriers may notify the collector of a reference 
modification by marking an appropriate card in a card table, adding a memory address to 
a sequential store buffer, and so forth. 

When emission of write-barrier code has been deferred in the compiled mutator 
code, as described in Section I above, some of the deferred write barriers may not have 
been executed before the collector interrupted the mutator's execution. In this case, the 
collector may have to be apprised of where these unrecorded reference modifications oc- 
curred prior to the collection interval. In accordance with an illustrative embodiment, the 
collector can ascertain the same information it would have received had the deferred 
write barriers been executed by accessing a compiler-generated data structure, such as a 
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table, associated with the possible safe point at which it interrupted the mutator's execu- 
tion. 

Fig. 25 illustrates exemplary pseudo-code for a mutator method m 2500 having 
reference-modifying instructions 2510, 2520 and 2540 (labeled A, B and C) and deferred 
write barriers 2570, 2580 and 2590, e.g., emitted at the end of the method. By way of 
example, the method 2500 comprises three possible safe points 2540, 2550 and 2560. 
Each of the possible safe points corresponds to an instruction in the mutator code where a 
garbage-collection interval may occur. 

Fig. 26 illustrates the illustrative compiler-generated tables 2750 and 2760 that 
may be respectively associated with the possible safe points 2550 and 2560. The tables 
2750 and 2760 may be derived from one or more of the compiler's "internal" data struc- 
tures, such as the tables 2610 and 2630. As previously described in Section I, these data 
structures may contain value-number expressions that identify where references are 
modified in the heap without corresponding write-barrier code. For example, the com- 
piler's table 2610 includes value number and offset values pairs corresponding to the ref- 
erence modifying instructions 2510 and 2520 emitted prior to the possible safe point 
2550 without corresponding write-barrier code. Similarly, the table 2630 contains entries 
for the instructions 2510, 2520 and 2530 that are emitted in the mutator prior to the pos- 
sible safe point 2560 without write-barrier code. Although the tables 2610 and 2630 il- 
lustratively identify the reference values modified by the instructions 2510-2530 as pairs 
of value numbers and offset values, those skilled in the art will understand that these 
value-number expressions may be represented in other ways consistent with the com- 
piler's configuration. 

In general, the compiler dynamically updates its internal tables, e.g., tables 2610 
and 2630, as it emits instructions into the compiled mutator code. Therefore, at each pos- 
sible safe point in the mutator code, the compiler can access an up-to-date list of where 
references are modified by mutator instructions without corresponding write-barrier code 
prior to the possible safe point. The compiler may convert the contents of these lists into 
a format that may be made accessible to the garbage collector at run time. Accordingly, 
the collector can use these compiler-generated tables to locate modified references whose 
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write barriers were deferred prior to its collection interval. Because the collector is not 
aware of the compiler's internal value-number expressions, the compiler converts its 
value-number expressions to run-time memory locations that are "understandable" to the 
collector. 

For example, the compiler generates the table 2750 associated with the possible 
safe point 2550 by converting pairs of value numbers and offset values in the compiler's 
internal table 2610 to equivalent run-time information, e.g., pairs of register and offset 
values. The compiler similarly generates the table 2760 associated with the possible safe 
point 2560 by converting value number and offset value pairs in the table 2630 to 
equivalent run-time memory locations. As shown in the tables 2610 and 2630, the com- 
piler's value number and offset values corresponding to the instructions 2510 and 2520 
are the same at the possible safe points 2550 and 2560. In contrast, the instructions' run- 
time representations in the tables 2610 and 2630 may be different at the possible safe 
points 2550 and 2560. For instance, the run-time location of the reference value modified 
by the instruction (B) 2520 changes from a register at possible safe point 2550 to a stack- 
frame slot at possible safe point 2560. The collector can therefore locate the reference 
value modified by the instruction (B) 2520 at different run-time locations, depending at 
which of the possible safe points 2550 and 2560 the collector performs its collection in- 
terval. 

Fig. 27 illustrates a set of compiler-generated tables 2740, 2750 and 2760 that are 
respectively associated with the possible safe points 2540, 2550 and 2560. Each table 
contains a list of one or more entries that store run-time information enabling the garbage 
collector to locate where mutator instructions modified references in the heap without 
execution of corresponding write-barrier code before the mutator's execution reached the 
table's associated possible safe point. For example, table 2740 stores information that 
indicates the instruction A is the only reference-modifying mutator instruction executed 
in the method 2500 without a corresponding write barrier before the mutator's execution 
reaches the possible safe point 2540. Table 2750 indicates that both instructions A and B 
are executed without corresponding write barriers before the mutator reaches the possible 
safe point 2550. Likewise, table 2760 indicates instructions A, B and C are executed 
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without corresponding write barriers before the mutator reaches the possible safe point 
2560. 

When the garbage collector performs a collection interval at one of the possible 
safe points, a pointer table 2700 generated by the compiler may be used by the collector 
to access an appropriate one of the tables 2740-2760. The table 2700 may be stored, e.g., 
in the meta-data data structure associated with the method 2500. The pointer table com- 
prises an indexed set of pointers, each of which stores the memory location of one of the 
tables 2740-2760. As shown, the pointer 2710 references the table 2740, the pointer 
2720 references the table 2750 and the pointer 2730 references the table 2760. The col- 
lector locates a pointer in the table 2700 through one of the indexes 2705, 2715 and 2725. 
In the illustrative embodiment, each index is related to a program counter (pc) value of a 
possible safe point in the mutator code. For instance, the table 2700 may be organized as 
a hash table where each index corresponds to the pc value of a possible safe point. More 
specifically, as shown in Fig. 27, the result of applying a hash function H to the pc value 
of a possible safe point (e.g., H(pc value)) may be used as an index into the table 2700. 

Illustratively, the collector (i) suspends the mutator's execution at one of the mu- 
tator's possible safe points, (ii) hashes the safe point's corresponding pc value to generate 
an index into the table 2700, (iii) locates a pointer at the indexed entry in the table 2700 
and (iv) accesses a table referenced by the pointer, the table containing a list of reference 
modifications whose write barriers were not executed prior to the possible safe point. In 
accordance with an illustrative embodiment, the collector may combine or elide entries in 
the located table using the techniques described in sections B(i)-(iii) herein. Then, the 
collector performs its garbage-collection operations based on the remaining, uncombined 
table entries and the results of write barriers previously executed in the mutator code. 

The same tables, e.g., tables 2740-2760, that enable the collector to locate where 
references modifications occur without write barriers at possible safe points may also be 
employed to apprise the collector of deferred write barriers when an "exception" is han- 
dled by the mutator code. An exception is typically the result of a program error, such as 
a divide-by-zero instruction, or may be an error-producing instruction explicitly inserted 
in the mutator code, e.g., for debugging purposes. To handle an exception, the thread 
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executing the mutator "unwinds" its call stack into a context in which it can handle that 
exception. However, this unwinding process needs to include the effects of performing 
any deferred write-barrier code, or these deferred actions will be lost. Therefore, when 
handling such an exception, the deferred write-barrier operations may be performed 
based on the contents of the compiler-generated tables used by the collector at the possi- 
ble safe points. Furthermore, entries in these tables may be combined or elided during 
the exception-handling procedure, as previously described herein. 

Here, it is noted that the compiler may "winnow" the contents of its internal ta- 
bles, such as the tables 2610 and 2630, by occasionally emitting some (or all) of the de- 
ferred write-barrier code into the mutator code. As a result, the number of write barriers 
deferred in the mutator code is decreased, and thus the amount of memory resources re- 
quired to store the run-time tables, such as the tables 2740-2760, may be reduced. A 
further advantage of periodically emitting the deferred write barriers may be that the col- 
lector does not as often have to perform the same deferred write-barrier actions at con- 
secutive collection intervals. Of course, the compiler weighs the possible benefits of oc- 
casionally winnowing its internal tables and emitting deferred write-barrier code with the 
run-time advantages that may be realized by deferring the write barriers for longer peri- 
ods of time. That is, by deferring emission of write-barrier code over longer durations, 
the compiler is more likely to be able to combine or elide the deferred write-barrier code, 
as set forth in the illustrative embodiments herein, thereby reducing the amount of redun- 
dant or unnecessary write-barrier code in the compiled mutator code. 

Fig. 28 illustrates an exemplary sequence of steps a garbage collector may per- 
form during a collection interval to locate where references were modified by a mutator 
without execution of corresponding write-barrier code. The sequence starts at step 2800 
and proceeds to step 2810 where the collector suspends the mutator's execution at a pos- 
sible safe point. At step 2820, the collector applies a hash function to the pc value corre- 
sponding to the safe point at which it suspended the mutator. Next, at step 2830, the gen- 
erated hash value is used as an index into a pointer table associated with the mutator 
code. 
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At step 2840, if the pc value does not match the entry in the pointer table, the 
collector goes to the next entry in the table, at step 2845, then returns to step 2830 so the 
pc value may be compared to next pointer-table entry. When a "matching" entry is lo- 
cated in the pointer table, at step 2850, the collector uses the value of the pointer-table 
entry's storing pointer to locate a compiler-generated table comprising information indi- 
cating where reference-modifying mutator instructions having deferred write barriers 
modified references before the collector interrupted execution of the mutator. 

Having located an appropriate table, at step 2860, the compiler may further com- 
bine or remove entries in the accessed table so as to reduce the number of references it 
must trace into the heap. That is, the collector may use any of the techniques described 
herein to combine or remove table entries associated with reference-modifying mutator 
instructions that would generate the same write-barrier code. For example, two table en- 
tries may be identical, and therefore the collector may elide one of the repetitive entries 
(e.g., see Figs. 10-11). The collector may also combine entries associated with reference 
modifications made in the same region of memory, e.g., based on a known alignment of 
objects (e.g., see Figs. 12-14). Further, the collector may elide entries that are within a 
known range of memory addresses (e.g., see Figs. 15-17). 

Once the collector has combined and elided table entries, at step 2870, the col- 
lector performs its collection interval at the safe point by locating references in the heap 
based on the contents of the remaining, uncombined table entries in combination with the 
results of any write-barrier code that was executed in the mutator. The sequence ends at 
step 2880. 

B. Garbage collecting deferred write barriers in the presence of calls 

As shown in Fig. 25, a garbage collector may perform a collection interval at any 
of a plurality of possible safe points in a mutator method m 2500. However, mutator 
code often includes more than one method. For example, the mutator code may include a 
conventional "call" instruction that suspends execution in a first method and directs the 
mutator execution to an instruction in a second method. 
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Fig. 29 illustrates an exemplary call instruction 2930 in a first method m 2900 that 
initiates execution of a second method p 2980. For reasons of simplicity, only one call 
instruction is shown in Fig. 29, although those skilled in the art will appreciate that mul- 
tiple calls may span more than two methods. The method m includes reference- 
modifying instructions 2910 and 2920 (labeled A and B) as well as the call instruction 
2930 to the method p. The method p includes reference-modifying instruction 2940 (la- 
beled C). The mutator starts its execution at the beginning of method m and sequentially 
progresses until it reaches the instruction 2930, at which point it suspends execution of 
the method m and begins executing code in the method p. Upon executing the last in- 
struction in method /?, the mutator may return and resume execution in the method m. 

In the illustrated call from method m to method /?, there are three possible safe 
points 2925, 2945 and 2965 where the garbage collector may interrupt the mutator to per- 
form a collection interval. The first possible safe point 2925 is located in the method m at 
the call instruction 2930, the second possible safe point 2945 is located in the method p, 
and the third possible safe point 2965 is located after the mutator resumes execution in 
the method m. In this example, when a garbage-collection interval takes place at any of 
these possible safe points, execution of at least one write barrier has been deferred. 
Therefore, during the collection interval, the collector may be configured to access one or 
more data structures that enable it to locate reference modifications whose corresponding 
write barriers have not yet been executed. 

Fig. 30 illustrates an exemplary call stack 3000 comprising stack frames 3010 and 
3020 corresponding to Fig. 29' s methods p and m. The garbage collector may be config- 
ured to locate pointers, e.g., located at known offsets in the call-stack frames, that store 
the memory locations of one or more meta-data data structures associated with the stack 
frames' corresponding methods. Each meta-data data structure stores, inter alia, a 
pointer table through which the collector may identify one or more tables (or other data 
structures) that indicate where the meta-data data structure's associated method executed 
reference-modifying instructions having deferred write-barrier code. 

For example, a pointer 3025, located in a predetermined position in the stack 
frame 3020, stores the memory address of a meta-data data structure 3060 associated with 
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Fig. 29's method m. The data structure 3060 includes a pointer table 3070 comprising an 
indexed set of pointers, each corresponding to a possible safe point in the method. Thus, 
the table 3070 comprises two pointers 3074 and 3078 respectively corresponding to the 
possible safe points 2925 and 2965 in the method m. The pointers 3074 and 3078 are lo- 
catable via their corresponding indexes 3072 and 3076, which may be derived, e.g., from 
the pc values of the pointers' corresponding possible safe points. For instance, the 
pointer table may be organized as a hash table where the pc value of a possible safe point 
in the method m is used as the key value for indexing the hash table. 

The pointers 3074 and 3078 store the locations of the tables 3080 and 3090, re- 
spectively. The table 3080 comprises an entry 3082 corresponding to the reference- 
modifying instruction 2910, which is executed without a corresponding write barrier be- 
fore the mutator reaches the safe point 2925. As shown with regards to a precise card- 
marking scheme, the entry 3082 stores information that identifies a register storing the 
memory address of an object and an offset of a reference field within the object that was 
modified by the mutator instruction 2910 (reference-write A). Those skilled in the art 
will understand the entry may store alternate or additional information, depending on the 
garbage collector implementation. The table 3090 comprises entries 3092 and 3094 cor- 
responding to the reference-modifying instructions 2910 and 2920 (reference-writes A 
and B) whose corresponding write barriers are not executed before the mutator reaches 
the possible safe point 2965. Advantageously, the collector can reduce the number of 
references it traces into the heap during a collection interval by combining or removing 
the entries in the tables 3080 and 3090 using the techniques previously described herein. 

Further to the exemplary call stack 3000, stack frame 3010 comprises a pointer 
3015 that references a meta-data data structure 3030 associated with Fig. 29's method p. 
The meta-data data structure 3030 includes, inter alia, a pointer table 3040 comprising a 
pointer 3044 and its associated index 3042. The pointer 3044 corresponds to the one pos- 
sible safe point 2945 in the method /?, and stores the memory location of a table 3050 as- 
sociated with the safe point. An entry 3052 in the table 3050 identifies a register and off- 
set pair indicating where the mutator instruction 2940 (reference-write C) modifies a ref- 
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erence without a corresponding write barrier before the mutator reaches the possible safe 
point 2945. 

Figs. 31A-B illustrate an exemplary sequence of steps a garbage collector may 
perform to locate where references are modified having deferred write barriers. The se- 
quence starts at step 3100 and proceeds to step 3105 where the collector interrupts the 
execution of a method at one of the method's possible safe points. Next, at step 3110, the 
collector applies a hash function to the safe point's program counter (pc) value. Then, at 
step 3115, the collector locates the "top" of the mutator's associated call stack, e.g., using 
a stack-pointer value stored in a predetermined, register. At step 3120, the collector lo- 
cates a pointer at a predetermined position in the top (i.e., most-recently added) stack 
frame. The pointer's value directs the collector to the memory address of a meta-data 
data structure associated with the method for this stack frame, at step 3125. 

Next, at step 3 130, the collector locates a pointer table, e.g., stored at a predeter- 
mined offset within the meta-data data structure. At step 3135, the collector uses the 
hashed pc value as an index into the pointer table and compares the pc value to an index 
value stored in the pointer-table entry, at step 3140. If the pc value matches the index, 
then the sequence proceeds to step 3145, else the collector moves to the next pointer- 
table entry, at step 3142, and compares the pc value to the index value in the next entry. 
The collector may repeat steps 3140 and 3142 until it finds a table entry having an index 
that matches the pc value. 

The pointer-table entry matching the pc value corresponds to a pointer in the 
pointer table whose value locates a table, e.g., previously generated by a compiler. The 
table identifies references that were modified in the suspended method, without execution 
of corresponding write barriers, before the method was suspended. Once the garbage 
collector locates the appropriate table, at step 3145, the collector may combine and elide 
table entries, at step 3150, using techniques previously described herein. At step 3155, 
the remaining entries are used by the collector to find modified references and take the 
appropriate action in response, such as recording their locations against the regions that 
contain the objects to which they refer. In addition, the collector may trace other refer- 
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ences into the heap, e.g., as identified by write barriers whose execution was not deferred 
before the collection interval. 

At step 3 160, the collector determines whether the stack frame corresponding to 
the suspended method is the "bottom" (i.e., least-recently added) frame in the call stack. 
If so, the collector performs any remaining collection interval operations, at step 3162, 
and the sequence ends at step 3175. However, if there are additional frames in the call 
stack corresponding to other suspended methods, then the collector proceeds to step 3165 
where it identifies the stack frame associated with the next method whose stack frame it 
has not scanned for deferred write barriers. At step 3170, the collector applies a hash 
function to the pc value where the method was suspended. This pc value may be re- 
trieved from a slot in the stack frame, derived from a return pc value stored in the stack 
frame, or acquired by other techniques known in the art. For each remaining stack frame, 
the collector repeats steps 3120-3170 until it has "walked" each frame of the stack during 
the collection interval. 

C. Trampoline code 

Because some methods may remain suspended over the course of multiple collec- 
tion intervals, it may be redundant for a garbage collector to scan every frame in a call 
stack every collection interval to locate where reference modifications occurred without 
execution of corresponding write-barrier code. That is, if a method remains suspended 
from one collection interval to another, the method could not have executed new refer- 
ence-modifying instructions since the last collection interval. Therefore, it is unnecessary 
for the collector to scan the method's stack frame to locate where new reference modifi- 
cations occurred having deferred write barriers since the last collection interval. 

Fig. 32 illustrates an illustrative call stack 3200 where each stack frame includes a 
flag value indicating whether the frame's associated method executed any instructions 
since the last collection interval. The exemplary call stack comprises a "top" frame 3210 
and multiple "lower" frames, including frame 3220. The top frame corresponds to the 
method whose execution was most recently suspended by the mutator, and each lower 
frame corresponds to a previously suspended method, in the order in which they were 
suspended. In other words, methods whose corresponding stack frames are closer to the 
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top of the call stack were suspended more recently than frames having their correspond- 
ing stack frames lower in the call stack. 

Each of the stack frames in the call stack 3200 comprises a pointer referencing a 
meta-data data structure associated with the frame's corresponding method. For exam- 
ple, a pointer 3015 stores the location of a meta-data data structure (not shown) associ- 
ated with the method p, and a pointer 3025 stores the location of a data structure (not 
shown) associated with the method m. Each frame also includes a return program- 
counter (pc) value, such as pc values 3216 and 3226, indicating which instruction the 
mutator will execute when the frame's corresponding method resumes. 

Illustratively, each stack frame comprises a flag value that equals a first value 
(e.g., "1") if mutator instructions have been executed in the frame's corresponding 
method since the last garbage collection interval, and equals a second value (e.g., "0") 
otherwise. Thus, new frames are "pushed" onto the top frame of the call stack having a 
flag value equal to "1." The collector sets a frame's flag value to "0" only after it per- 
forms a collection interval that determines whether the frame's corresponding method 
executed reference-modifying instructions whose write barriers were deferred. While 
flag values of "0" and "1" are shown in the exemplary call stack, those skilled in the art 
will understand other indicators, such as other types of boolean values, may be equiva- 
lently used. 

Frames closer to the top stack frame are more likely to have executed mutator in- 
structions since the last collection interval since they are suspended more recently than 
lower frames in the call stack. In fact, there may be a "boundary" frame in the call stack 
where all frames above, and including, the boundary frame are associated with methods 
that executed mutator instructions since the last collection interval and all frames below 
are associated with methods that remained suspended since the last collection interval. 
For example, the frame 3210 is a boundary frame since its flag value 3214 equals "1," 
and the flag values of frames below it (e.g., flag value 3224) equal "0." 

In accordance with the illustrative embodiment, during a garbage-collection inter- 
val, the collector sets the flag value in each call-stack frame it scans for deferred write 
barriers equal to "0" and modifies the return pc value of a boundary frame, if one exists, 
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to direct the mutator to a special block of code, hereinafter "trampoline code." The tram- 
poline code is configured to update the flag values in the stack frames to ensure that the 
location of the boundary frame is properly adjusted as the mutator executes. In those 
cases where the collector scans every frame, the next frame pushed onto the call stack 
after the mutator code execution resumes becomes a new boundary frame (e.g., its flag 
value equals "1"), and its return pc value is modified to store the memory address of the 
trampoline code. 

As shown in Fig. 32, the return pc field 3216 in boundary frame 3210 is modified 
to equal to the memory address tramp _pc of the trampoline code 3240. Thus, when exe- 
cution of the method p resumes, the mutator is directed to execute the trampoline code. 
The trampoline code, in turn, selects a new boundary frame for the collector and sets the 
appropriate flag values in the call stack 3200, e.g., equal to "1," to reflect the location of 
the new boundary frame. Notably, the new boundary frame may be selected as the N^ 
frame below the current boundary frame (if one exists), where N equals 1, 2, 3, etc. The 
trampoline code then sets the return pc value in the new boundary frame equal to the 
tramp _pc value. In this manner, the N* frame below the current boundary frame will be- 
come the new boundary frame when execution of the method associated with the current 
boundary frame is resumed. For purposes of explanation, in Fig. 32 the value of N is ar- 
bitrarily set equal to one. When the mutator code resumes execution in the method p, the 
mutator is directed by the return pc value 3216 to execute the trampoline code 3240. 
Upon its execution, the trampoline code sets the flag 3224 to equal "1" and also modifies 
the return pc value 3226 in method m's stack frame to equal tramp _pc (as indicated by 
the dotted line). 

Fig. 33 illustrates an exemplary sequence of steps that may be performed by the 
trampoline code in Fig. 32. The sequence starts at step 3300 and proceeds to step 3310 
where a first instruction in the trampoline code stores method p's return pc address in a 
temporary register or stack-frame slot. At step 3320, the trampoline code locates a stack 
pointer, e.g., stored in a designated register, that enables the trampoline code to locate the 
frame 3220 located directly below the frame 3210 through which the trampoline code 
was called. Alternatively, the trampoline code may locate the N* frame below the frame 
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3210, where N may be greater than one. Next, at step 3330, the trampoline code sets the 
flag value 3224 in method m's stack frame 3220 equal to "1" to reflect that the frame 
3220 becomes the new boundary frame. At step 3340, the trampoline code extracts 
method m's return address from the stack frame 3220 and then modifies the first instruc- 
tion in the trampoline code so it will store method w's return pc address in the temporary 
register or stack-frame slot next time the trampoline code is called. Or put another way, 
the return pc address of method p is overwritten and replaced with method m's return pc 
address in the first instruction of the trampoline code. At step 3350, method m's return 
pc value is then modified to equal tramp j)c. At step 3360, the trampoline code branches 
to the method p's return pc address stored in the temporary register or stack-frame slot. 
The sequence ends at step 3370. 

III. CONCLUSION 

The foregoing has been a detailed description of an illustrative embodiment of the 
invention. Various modifications and additions can be made without departing from the 
spirit and scope of the invention. For example, although the compiler's "internal" tables 
described in the illustrative embodiments contain pairs of value numbers and offset val- 
ues for identifying modified references in the heap, those skilled in the art will under- 
stand that the format and contents of the tables may differ depending on the compiler's 
configuration. For example, rather than indicate memory locations based on value- 
number expressions, the compiler's internal tables may instead record other types of data- 
flow information, run-time information, etc. Moreover, while the illustrative embodi- 
ments are described for exemplary mutator data flows, such as basic blocks, extended 
basic blocks, etc., the invention is generally applicable for deferring write barriers in 
mutator code having any arbitrary data flow. 

In general, write barriers may be deferred in an arbitrary data flow so long as their 
corresponding reference modifications are made to memory locations that are recover- 
able, e.g., from the compiler's value-number and register/frame-slot assignments. We 
have already described one case wherein deferred write barriers are merged at "join" 
nodes. Such a more generalized data flow analysis will require an analysis to ensure that 
no write barriers depending on recoverable memory addresses abstractly represented by 
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their value numbers are deferred past the instructions whose abstract interpretation by the 
compiler generated those value numbers. Chapters 8 and 12 of Advanced Compiler De- 
sign and Implementation, by Steven Muchnick, published 1997, which is hereby incorpo- 
rated by reference as though fully set forth herein, describe several forms of such analysis 
and value numbering to support them. 

While some of the illustrative embodiments have described reference modifica- 
tions made to values stored in object-reference fields, the teachings set forth herein 
equally apply to reference modifications made to array elements. For instance, where 
register a was previously used to denote the register storing the memory address of an 
object a, it could alternatively store the memory address of an array a. In this case, the 
compiler in the illustrative embodiments may identify the array based on a value number 
assigned to the array. Further, in a precise card-marking scheme, the compiler may iden- 
tify a particular element in the array using various value-number expressions, such as by 
a combination of the array's value number and an appropriate offset in the array. 

It is expressly contemplated that the teachings of this invention can be imple- 
mented as software, including a computer-readable medium having program instructions 
executing on a computer, hardware, firmware, or any combination thereof. The software 
may be embodied as electromagnetic signals by which the computer instructions can be 
communicated. Accordingly this description is meant to be taken only by way of exam- 
ple and not to otherwise limit the scope of the invention. 

What is claimed is: 
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